In [1]:
!pip install folium plotly
Requirement already satisfied: folium in /home/sunny/anaconda3/lib/python3.7/site-packages (0.10.0)
Requirement already satisfied: plotly in /home/sunny/anaconda3/lib/python3.7/site-packages (4.3.0)
Requirement already satisfied: requests in /home/sunny/anaconda3/lib/python3.7/site-packages (from folium) (2.22.0)
Requirement already satisfied: branca>=0.3.0 in /home/sunny/anaconda3/lib/python3.7/site-packages (from folium) (0.3.1)
Requirement already satisfied: jinja2>=2.9 in /home/sunny/anaconda3/lib/python3.7/site-packages (from folium) (2.10.1)
Requirement already satisfied: numpy in /home/sunny/anaconda3/lib/python3.7/site-packages (from folium) (1.16.4)
Requirement already satisfied: retrying>=1.3.3 in /home/sunny/anaconda3/lib/python3.7/site-packages (from plotly) (1.3.3)
Requirement already satisfied: six in /home/sunny/anaconda3/lib/python3.7/site-packages (from plotly) (1.12.0)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/sunny/anaconda3/lib/python3.7/site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /home/sunny/anaconda3/lib/python3.7/site-packages (from requests->folium) (2.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /home/sunny/anaconda3/lib/python3.7/site-packages (from requests->folium) (1.24.2)
Requirement already satisfied: certifi>=2017.4.17 in /home/sunny/anaconda3/lib/python3.7/site-packages (from requests->folium) (2019.6.16)
Requirement already satisfied: MarkupSafe>=0.23 in /home/sunny/anaconda3/lib/python3.7/site-packages (from jinja2>=2.9->folium) (1.1.1)
In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
import folium
import plotly.graph_objects as go
import plotly

from bs4 import BeautifulSoup
import requests
from IPython.display import Image, display


mapbox_access_token = 'pk.eyJ1Ijoic3VubnkyMzA5IiwiYSI6ImNrMnlvNGRoMzA5YW0zY29tMGVhaWNwa2YifQ.E6iZyoNa1Inbf93yh-kjhw'

pd.set_option('display.max_columns',120)
pd.set_option('display.max_rows',100)

%matplotlib inline

Exercise #1 - Data Cleaning

Step 1.1 - Read in the data. Create a table of the unique values of Category. Which values do you believe represent a restaurant? Explain which categories you chose in a markdown cell. Create a single dummy variable for restaurant that combines multiple values from Category.

In [3]:
food_inspection = pd.read_csv('Food_Inspection (1).csv')
#print('Columns : ',food_inspection.columns)
print('Food Inspection Dataframe Size : ',food_inspection.shape)
food_inspection.head()
Food Inspection Dataframe Size :  (13701, 31)
Out[3]:
Establishment ID Name Address 1 Address 2 City Zip Inspection Date Inspection Results Food from Approved Source (C) Food Protected from Contamination (C) Ill Workers Restricted (C) Proper Hand Washing (C) Cooling Time and Temperature (C) Cold Holding Temperature (C) Hot Holding Temperature (C) Cooking Time and Temperature (C) Reheating Time and Temperature (C) Hot and Cold Running Water Provided (C) Proper Sewage Disposal (C) Toxic Substances & Pesticides Rodent and Insects Nutritional Labeling Trans Fat Ban No-Smoking Sign Posted Inspection Type Owner Category Type Latitude Longitude Location
0 18628 SUBWAY #22916 12193 DARNESTOWN RD. NaN GAITHERSBURG 20878 12/19/2018 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Monitoring P. N. ENTERPRISES, LLC Restaurant Food 39.1179 -77.2524 (39.1179, -77.2524)
1 1059 GIANT FOOD #0192 18250 FLOWER HILL WAY NaN GAITHERSBURG 20879 12/19/2018 Critical Violations Corrected In Compliance In Compliance In Compliance Out of Compliance Out of Compliance Out of Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Comprehensive GIANT OF MARYLAND, LLC Market Food 39.1551 -77.1659 (39.1551, -77.1659)
2 20933 PIZZA BOLIS 7831 EASTERN AVE. NaN SILVER SPRING 20910 12/19/2018 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Monitoring K.M.MG INC Restaurant Food 38.9853 -77.0273 (38.9853, -77.0273)
3 22217 GREENE TURTLE 18101 TOWN CENTER DR NaN OLNEY 20832 12/13/2017 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Comprehensive OLNEY GT, LLC. Restaurant Food 39.1519 -77.0619 (39.1519, -77.0619)
4 28408 CRAVE 7101 DEMOCRACY BLVD SPACE #FC BETHESDA 20817 12/13/2018 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Out of Compliance Not applicable In Compliance In Compliance Monitoring CRAVE RESTAURANT BETHESDA LLC Restaurant Food 39.0221 -77.1471 (39.0221, -77.1471)
In [4]:
food_inspection[['Category']].drop_duplicates()
Out[4]:
Category
0 Restaurant
1 Market
13 Caterer
18 Non-Profit
23 Public School- Other
27 NaN
28 Public School- Elementary
33 Hospital
36 Public School- Middle
37 Assisted Living
42 Private School
67 Nursing Home
100 Public School- High
116 Carry Out
307 Farmers Market
928 Snack Bar
1472 Institution
1499 Excluded Organization
1567 Seasonal
7441 Mobile Unit

We have created another column named Retaurant which has value True if entry is for restaurant else False. We have selected entry Restaurant as retaurant type to be considered.

In [5]:
list_of_restaurants = ['Fast Food', 'Restaurant', 'Cafeteria', 'Seafood', 'Buffet', 'Diner','Pizza']
#list_of_restaurants = ['Restaurant']
food_inspection['Restaurant'] = [True if val in list_of_restaurants else False for val in food_inspection['Category'].values]
In [6]:
#food_inspection[food_inspection.Restaurant == True]['Establishment ID'].value_counts()
In [7]:
food_inspection.head()
Out[7]:
Establishment ID Name Address 1 Address 2 City Zip Inspection Date Inspection Results Food from Approved Source (C) Food Protected from Contamination (C) Ill Workers Restricted (C) Proper Hand Washing (C) Cooling Time and Temperature (C) Cold Holding Temperature (C) Hot Holding Temperature (C) Cooking Time and Temperature (C) Reheating Time and Temperature (C) Hot and Cold Running Water Provided (C) Proper Sewage Disposal (C) Toxic Substances & Pesticides Rodent and Insects Nutritional Labeling Trans Fat Ban No-Smoking Sign Posted Inspection Type Owner Category Type Latitude Longitude Location Restaurant
0 18628 SUBWAY #22916 12193 DARNESTOWN RD. NaN GAITHERSBURG 20878 12/19/2018 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Monitoring P. N. ENTERPRISES, LLC Restaurant Food 39.1179 -77.2524 (39.1179, -77.2524) True
1 1059 GIANT FOOD #0192 18250 FLOWER HILL WAY NaN GAITHERSBURG 20879 12/19/2018 Critical Violations Corrected In Compliance In Compliance In Compliance Out of Compliance Out of Compliance Out of Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Comprehensive GIANT OF MARYLAND, LLC Market Food 39.1551 -77.1659 (39.1551, -77.1659) False
2 20933 PIZZA BOLIS 7831 EASTERN AVE. NaN SILVER SPRING 20910 12/19/2018 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Monitoring K.M.MG INC Restaurant Food 38.9853 -77.0273 (38.9853, -77.0273) True
3 22217 GREENE TURTLE 18101 TOWN CENTER DR NaN OLNEY 20832 12/13/2017 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Not applicable In Compliance In Compliance Comprehensive OLNEY GT, LLC. Restaurant Food 39.1519 -77.0619 (39.1519, -77.0619) True
4 28408 CRAVE 7101 DEMOCRACY BLVD SPACE #FC BETHESDA 20817 12/13/2018 No Critical Violations Noted In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Out of Compliance Not applicable In Compliance In Compliance Monitoring CRAVE RESTAURANT BETHESDA LLC Restaurant Food 39.0221 -77.1471 (39.0221, -77.1471) True

Step 1.2 - Convert the Inspection_date column into a datetime column. Create a new column for the year of the inspection. Create a new column for the month of the inspection. Create a column for the year and month.

In [8]:
print('Food Inspection Date Datatype Before', food_inspection['Inspection Date'].dtype)
food_inspection['Inspection Date'] = pd.to_datetime(food_inspection['Inspection Date'])
print('Food Inspection Date Datatype Before', food_inspection['Inspection Date'].dtype)

food_inspection['Inspection_year'] = [dt.year for dt in food_inspection['Inspection Date']]
food_inspection['Inspection_month'] = [dt.month for dt in food_inspection['Inspection Date']]
food_inspection['Inspection_year_month'] = food_inspection['Inspection_month'].map(str) + '_' + food_inspection['Inspection_year'].map(str)
Food Inspection Date Datatype Before object
Food Inspection Date Datatype Before datetime64[ns]
In [9]:
food_inspection[['Inspection Date', 'Inspection_year', 'Inspection_month', 'Inspection_year_month']]
Out[9]:
Inspection Date Inspection_year Inspection_month Inspection_year_month
0 2018-12-19 2018 12 12_2018
1 2018-12-19 2018 12 12_2018
2 2018-12-19 2018 12 12_2018
3 2017-12-13 2017 12 12_2017
4 2018-12-13 2018 12 12_2018
5 2018-12-19 2018 12 12_2018
6 2018-12-18 2018 12 12_2018
7 2017-12-12 2017 12 12_2017
8 2017-12-12 2017 12 12_2017
9 2018-12-18 2018 12 12_2018
10 2017-12-12 2017 12 12_2017
11 2017-12-13 2017 12 12_2017
12 2017-12-13 2017 12 12_2017
13 2018-12-18 2018 12 12_2018
14 2018-12-13 2018 12 12_2018
15 2018-12-19 2018 12 12_2018
16 2018-12-18 2018 12 12_2018
17 2018-12-19 2018 12 12_2018
18 2017-12-12 2017 12 12_2017
19 2018-12-18 2018 12 12_2018
20 2018-12-19 2018 12 12_2018
21 2018-12-19 2018 12 12_2018
22 2017-12-12 2017 12 12_2017
23 2017-12-12 2017 12 12_2017
24 2017-12-13 2017 12 12_2017
25 2017-12-13 2017 12 12_2017
26 2018-12-17 2018 12 12_2018
27 2018-12-19 2018 12 12_2018
28 2018-12-17 2018 12 12_2018
29 2018-12-18 2018 12 12_2018
30 2018-12-19 2018 12 12_2018
31 2018-12-14 2018 12 12_2018
32 2018-12-18 2018 12 12_2018
33 2017-12-13 2017 12 12_2017
34 2018-12-13 2018 12 12_2018
35 2017-12-12 2017 12 12_2017
36 2018-12-14 2018 12 12_2018
37 2018-12-18 2018 12 12_2018
38 2018-12-19 2018 12 12_2018
39 2018-12-14 2018 12 12_2018
40 2018-12-19 2018 12 12_2018
41 2018-12-19 2018 12 12_2018
42 2018-12-15 2018 12 12_2018
43 2018-12-19 2018 12 12_2018
44 2018-12-14 2018 12 12_2018
45 2018-12-19 2018 12 12_2018
46 2018-12-19 2018 12 12_2018
47 2018-12-18 2018 12 12_2018
48 2017-12-11 2017 12 12_2017
49 2018-12-19 2018 12 12_2018
... ... ... ... ...
13651 2018-12-18 2018 12 12_2018
13652 2018-02-26 2018 2 2_2018
13653 2018-12-17 2018 12 12_2018
13654 2017-09-06 2017 9 9_2017
13655 2018-09-27 2018 9 9_2018
13656 2018-10-30 2018 10 10_2018
13657 2018-05-04 2018 5 5_2018
13658 2018-12-17 2018 12 12_2018
13659 2018-12-18 2018 12 12_2018
13660 2018-09-21 2018 9 9_2018
13661 2018-04-13 2018 4 4_2018
13662 2018-12-18 2018 12 12_2018
13663 2018-12-18 2018 12 12_2018
13664 2017-11-09 2017 11 11_2017
13665 2018-12-18 2018 12 12_2018
13666 2018-05-16 2018 5 5_2018
13667 2018-12-18 2018 12 12_2018
13668 2017-12-20 2017 12 12_2017
13669 2018-12-18 2018 12 12_2018
13670 2018-12-18 2018 12 12_2018
13671 2017-11-14 2017 11 11_2017
13672 2018-12-18 2018 12 12_2018
13673 2018-12-18 2018 12 12_2018
13674 2018-06-26 2018 6 6_2018
13675 2018-01-18 2018 1 1_2018
13676 2018-12-17 2018 12 12_2018
13677 2018-04-04 2018 4 4_2018
13678 2017-10-05 2017 10 10_2017
13679 2018-12-18 2018 12 12_2018
13680 2018-12-18 2018 12 12_2018
13681 2018-12-18 2018 12 12_2018
13682 2017-12-07 2017 12 12_2017
13683 2018-01-09 2018 1 1_2018
13684 2018-12-18 2018 12 12_2018
13685 2018-12-18 2018 12 12_2018
13686 2018-04-03 2018 4 4_2018
13687 2018-06-07 2018 6 6_2018
13688 2017-10-23 2017 10 10_2017
13689 2018-12-17 2018 12 12_2018
13690 2018-12-18 2018 12 12_2018
13691 2018-12-18 2018 12 12_2018
13692 2018-12-17 2018 12 12_2018
13693 2018-03-20 2018 3 3_2018
13694 2018-11-09 2018 11 11_2018
13695 2017-10-26 2017 10 10_2017
13696 2018-12-18 2018 12 12_2018
13697 2017-10-10 2017 10 10_2017
13698 2018-12-17 2018 12 12_2018
13699 2018-05-10 2018 5 5_2018
13700 2018-10-03 2018 10 10_2018

13701 rows × 4 columns

Step 1.3 - For each column with the type of compliance, e.g. "Rodent and Insects," create a dummy variable that is 1 if the establishment is out of compliance and 0 otherwise. Use np.nan for not applicable. Drop the string columns (retain only the dummy variables).

In [10]:
compliance_type_columns = ['Food from Approved Source (C)', 'Food Protected from Contamination (C)', 'Ill Workers Restricted (C)', \
                           'Proper Hand Washing (C)', 'Cooling Time and Temperature (C)', 'Cold Holding Temperature (C)',\
                          'Hot Holding Temperature (C)', 'Cooking Time and Temperature (C)', 'Reheating Time and Temperature (C)',\
                          'Hot and Cold Running Water Provided (C)', 'Proper Sewage Disposal (C)', 'Toxic Substances & Pesticides', \
                          'Rodent and Insects', 'Nutritional Labeling', 'Trans Fat Ban', 'No-Smoking Sign Posted']
for col in compliance_type_columns:
    print(col,' : ',food_inspection[col].unique())

for col in compliance_type_columns:
    food_inspection[col] = [(1 if val=='Out of Compliance' else 0 if val=='In Compliance' else np.nan ) for val in food_inspection[col]]
Food from Approved Source (C)  :  ['In Compliance' 'Out of Compliance']
Food Protected from Contamination (C)  :  ['In Compliance' 'Out of Compliance']
Ill Workers Restricted (C)  :  ['In Compliance' 'Out of Compliance']
Proper Hand Washing (C)  :  ['In Compliance' 'Out of Compliance']
Cooling Time and Temperature (C)  :  ['In Compliance' 'Out of Compliance']
Cold Holding Temperature (C)  :  ['In Compliance' 'Out of Compliance']
Hot Holding Temperature (C)  :  ['In Compliance' 'Out of Compliance']
Cooking Time and Temperature (C)  :  ['In Compliance' 'Out of Compliance']
Reheating Time and Temperature (C)  :  ['In Compliance' 'Out of Compliance']
Hot and Cold Running Water Provided (C)  :  ['In Compliance' 'Out of Compliance']
Proper Sewage Disposal (C)  :  ['In Compliance' 'Out of Compliance']
Toxic Substances & Pesticides  :  ['In Compliance' 'Out of Compliance']
Rodent and Insects  :  ['In Compliance' 'Out of Compliance']
Nutritional Labeling  :  ['Not applicable' 'In Compliance']
Trans Fat Ban  :  ['In Compliance' 'Out of Compliance']
No-Smoking Sign Posted  :  ['In Compliance' 'Out of Compliance']
In [11]:
food_inspection[compliance_type_columns]
Out[11]:
Food from Approved Source (C) Food Protected from Contamination (C) Ill Workers Restricted (C) Proper Hand Washing (C) Cooling Time and Temperature (C) Cold Holding Temperature (C) Hot Holding Temperature (C) Cooking Time and Temperature (C) Reheating Time and Temperature (C) Hot and Cold Running Water Provided (C) Proper Sewage Disposal (C) Toxic Substances & Pesticides Rodent and Insects Nutritional Labeling Trans Fat Ban No-Smoking Sign Posted
0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
1 0 0 0 1 1 1 0 0 0 0 0 0 0 NaN 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0 0 NaN 0 0
6 0 0 0 0 0 0 1 0 0 0 0 0 0 NaN 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
8 0 0 0 0 0 1 0 0 0 0 0 0 1 NaN 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13 0 1 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
14 0 1 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
16 0 0 0 0 1 0 0 0 0 0 0 0 1 NaN 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
19 0 0 0 0 0 1 0 1 0 0 0 0 1 NaN 0 0
20 0 0 0 0 0 1 1 0 0 0 0 0 0 NaN 0 0
21 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
22 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
23 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
24 0 0 0 0 0 0 1 0 0 0 0 0 0 NaN 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
26 0 0 0 0 0 1 1 0 0 0 0 0 0 NaN 0 0
27 0 0 0 0 0 0 0 0 0 1 0 0 0 NaN 0 0
28 0 0 0 1 0 1 0 0 0 0 0 0 0 NaN 0 0
29 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
30 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
31 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
32 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
33 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
34 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
35 0 0 0 0 0 1 0 0 0 0 0 0 1 NaN 0 0
36 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
37 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
39 0 0 0 0 0 0 1 0 0 0 0 0 0 NaN 0 0
40 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
41 0 1 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
42 0 0 0 1 0 0 0 0 0 0 0 0 0 NaN 1 1
43 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
44 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
45 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
46 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
47 0 0 0 0 0 1 0 0 0 0 0 0 1 NaN 0 0
48 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
49 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13651 0 0 0 0 0 1 1 0 0 0 0 0 0 NaN 0 0
13652 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13653 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13654 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
13655 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13656 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13657 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
13658 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13659 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13660 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13661 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13662 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
13663 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13664 0 0 0 1 0 1 0 0 0 0 0 0 0 NaN 0 0
13665 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13666 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13667 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13668 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
13669 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
13670 0 1 0 0 0 1 1 0 0 0 0 0 0 NaN 0 0
13671 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13672 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
13673 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13674 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13675 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 0 0
13676 0 0 0 0 0 0 1 0 0 0 0 0 0 NaN 0 0
13677 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13678 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13679 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
13680 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13681 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13682 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13683 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13684 0 0 0 0 0 0 0 0 0 0 0 0 1 NaN 1 0
13685 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13686 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13687 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13688 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13689 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13690 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13691 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13692 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13693 0 0 0 0 0 1 0 0 0 0 0 0 0 NaN 0 0
13694 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
13695 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13696 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0
13697 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
13698 0 0 0 0 0 0 1 0 0 0 0 0 0 NaN 0 0
13699 0 0 0 0 0 1 0 0 0 0 0 0 1 NaN 0 0
13700 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN 0 0

13701 rows × 16 columns

Step 1.4 - Create a new column that contains the number of violations for that inspection (the number of categories where the establishment was not in compliance). Create a dummy variable that is 1 if the establishment is out of compliance in any category.

In [12]:
food_inspection['Num_of_violations'] = np.nansum(food_inspection[compliance_type_columns].values, axis=1)
food_inspection['Num_of_violations'][:10]
Out[12]:
0    0.0
1    3.0
2    0.0
3    0.0
4    1.0
5    1.0
6    1.0
7    1.0
8    2.0
9    0.0
Name: Num_of_violations, dtype: float64
In [13]:
food_inspection['Out_of_compliance'] = [1 if val>0 else 0 for val in food_inspection['Num_of_violations']]
food_inspection['Out_of_compliance'][:10]
Out[13]:
0    0
1    1
2    0
3    0
4    1
5    1
6    1
7    1
8    1
9    0
Name: Out_of_compliance, dtype: int64

Step 1.5 - For establishments with multiple inspections, create a new DataFrame in wide format. Keep only the establishment ID, Category, Inspection_date, and number of violations. Make sure category is consistent within ID and resolve any discrepancies if necessary (i.e. each establishment has only one category). Reshape from long to wide (pivot) such that each establishment is a row and you have a column for the date and number of violations for inspection 1, inspection 2, inspection 3, etc.

In [14]:
establishment_id_to_inspection_cnt = Counter(food_inspection[food_inspection.Restaurant == True]['Establishment ID'])
establishment_id_to_inspection_cnt = dict([(key,val) for key, val in establishment_id_to_inspection_cnt.items() if val > 1])

food_inspection_multi_insp = food_inspection[food_inspection['Establishment ID'].isin(establishment_id_to_inspection_cnt.keys())].copy()
sub_food_inspection = food_inspection_multi_insp[['Establishment ID', 'Category', 'Inspection Date', 'Num_of_violations']]
sub_food_inspection.head()
Out[14]:
Establishment ID Category Inspection Date Num_of_violations
0 18628 Restaurant 2018-12-19 0.0
2 20933 Restaurant 2018-12-19 0.0
3 22217 Restaurant 2017-12-13 0.0
4 28408 Restaurant 2018-12-13 1.0
5 28011 Restaurant 2018-12-19 1.0
In [15]:
total_data = []
for idx, df in sub_food_inspection.groupby('Establishment ID'):
    row_data = [df['Establishment ID'].values[0],df['Category'].values[0]]
    dates_violations = list(zip(df['Inspection Date'],df['Num_of_violations']))
    for d_v in dates_violations:
        row_data.extend(d_v)
    total_data.append(row_data)
In [16]:
max_cnt = max([len(row) for row in total_data])
print(max_cnt)

columns = ['Establishment ID', 'Category']
for i in range((max_cnt-2)//2):
    columns.extend(['Inspection_date_%d'%(i+1),'Inspection_%d'%(i+1)])

est_to_datewise_insp_viols = pd.DataFrame(total_data, columns=columns)
est_to_datewise_insp_viols.head(10)
36
Out[16]:
Establishment ID Category Inspection_date_1 Inspection_1 Inspection_date_2 Inspection_2 Inspection_date_3 Inspection_3 Inspection_date_4 Inspection_4 Inspection_date_5 Inspection_5 Inspection_date_6 Inspection_6 Inspection_date_7 Inspection_7 Inspection_date_8 Inspection_8 Inspection_date_9 Inspection_9 Inspection_date_10 Inspection_10 Inspection_date_11 Inspection_11 Inspection_date_12 Inspection_12 Inspection_date_13 Inspection_13 Inspection_date_14 Inspection_14 Inspection_date_15 Inspection_15 Inspection_date_16 Inspection_16 Inspection_date_17 Inspection_17
0 63 Restaurant 2018-01-18 1.0 2019-05-30 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
1 71 Restaurant 2018-02-07 0.0 2019-02-21 2.0 2018-06-14 1.0 2018-06-27 0.0 2017-09-22 0.0 2018-10-18 2.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
2 83 Restaurant 2019-04-09 0.0 2018-03-14 1.0 2017-09-06 1.0 2018-10-01 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
3 91 Restaurant 2018-01-25 0.0 2019-04-10 0.0 2018-09-28 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
4 98 Restaurant 2018-12-27 0.0 2018-05-01 0.0 2017-10-24 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
5 99 Restaurant 2018-10-30 0.0 2018-03-13 0.0 2019-05-07 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
6 135 Restaurant 2017-12-20 0.0 2018-03-29 0.0 2019-06-25 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
7 157 Restaurant 2018-01-24 1.0 2018-10-30 0.0 2018-05-23 1.0 2017-09-21 1.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
8 163 Restaurant 2019-01-17 0.0 2019-05-13 0.0 2018-05-08 0.0 2018-09-17 1.0 2017-11-30 0.0 2017-11-22 0.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN
9 183 Restaurant 2018-03-27 2.0 2019-06-19 0.0 2018-04-26 1.0 2017-12-04 2.0 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN

Exercise #2 - Summary Statistics/Grouped Data

Step 2.1 - What is the most common type of violation? The compliance categories are not mutually exclusive because one restaurant can have multiple violations. Create a table with the number of violations by violation type. Sort the table from the most common to least common violations.

In [17]:
violation_type_to_cnt = pd.DataFrame(food_inspection[food_inspection.Restaurant == True][compliance_type_columns].sum().sort_values(ascending=False)).rename(columns={0:'Num_of_violations'})
violation_type_to_cnt
Out[17]:
Num_of_violations
Cold Holding Temperature (C) 1865.0
Rodent and Insects 1266.0
Hot Holding Temperature (C) 559.0
Food Protected from Contamination (C) 406.0
Proper Hand Washing (C) 342.0
Trans Fat Ban 200.0
Cooling Time and Temperature (C) 174.0
No-Smoking Sign Posted 140.0
Reheating Time and Temperature (C) 64.0
Hot and Cold Running Water Provided (C) 56.0
Food from Approved Source (C) 38.0
Cooking Time and Temperature (C) 29.0
Proper Sewage Disposal (C) 20.0
Toxic Substances & Pesticides 4.0
Ill Workers Restricted (C) 2.0
Nutritional Labeling 0.0

Findings: From above we can see that Cold_holding_temperature is most common type of violation.

Step 2.2 - For establishments with multiple inspections, how many reinspections does it take for an establishment to become compliant? Create a table where each row is the number of inspections a restaurant has had and the columns are the number of reinspections until the establishment becomes compliant. Write 2-4 sentences with your observations. A mock up of this table is below (you will have more rows)

In [18]:
all_data = []
for i in range(2, (max_cnt-2)//2):
    sub_df = est_to_datewise_insp_viols[est_to_datewise_insp_viols['Inspection_%d'%i] == 0]
    row_data = [sub_df['Inspection_1'].sum(),]
    for j in range(2, i+1):
        row_data.append(sub_df['Inspection_%d'%j].sum())
    all_data.append(row_data)

max_cnt = max([len(row) for row in all_data])
index = ['%d Inspections'%(i+1) for i in range(len(all_data))]
columns = ['Never Compliant']
columns.extend(['Compliant After %d Re-Inspections'%(i+1) for i in range(max_cnt-1)])

no_violations_to_become_compliant = pd.DataFrame(all_data, columns = columns, index=index)
In [19]:
no_violations_to_become_compliant.loc['Total'] = no_violations_to_become_compliant[1:].sum()
no_violations_to_become_compliant['Total'] = no_violations_to_become_compliant.sum(axis=1)
no_violations_to_become_compliant[1:]
Out[19]:
Never Compliant Compliant After 1 Re-Inspections Compliant After 2 Re-Inspections Compliant After 3 Re-Inspections Compliant After 4 Re-Inspections Compliant After 5 Re-Inspections Compliant After 6 Re-Inspections Compliant After 7 Re-Inspections Compliant After 8 Re-Inspections Compliant After 9 Re-Inspections Compliant After 10 Re-Inspections Compliant After 11 Re-Inspections Compliant After 12 Re-Inspections Compliant After 13 Re-Inspections Compliant After 14 Re-Inspections Compliant After 15 Re-Inspections Total
2 Inspections 418.0 394.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 812.0
3 Inspections 299.0 286.0 304.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 889.0
4 Inspections 154.0 165.0 171.0 180.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 670.0
5 Inspections 84.0 97.0 90.0 86.0 106.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 463.0
6 Inspections 38.0 47.0 45.0 44.0 51.0 55.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 280.0
7 Inspections 22.0 20.0 21.0 27.0 18.0 20.0 25.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN 153.0
8 Inspections 9.0 17.0 6.0 13.0 8.0 13.0 15.0 5.0 0.0 NaN NaN NaN NaN NaN NaN NaN 86.0
9 Inspections 1.0 0.0 0.0 0.0 0.0 2.0 0.0 1.0 1.0 0.0 NaN NaN NaN NaN NaN NaN 5.0
10 Inspections 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN 0.0
11 Inspections 1.0 5.0 2.0 3.0 0.0 4.0 2.0 2.0 2.0 5.0 3.0 0.0 NaN NaN NaN NaN 29.0
12 Inspections 1.0 5.0 2.0 3.0 0.0 4.0 2.0 2.0 2.0 5.0 3.0 0.0 0.0 NaN NaN NaN 29.0
13 Inspections 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN 0.0
14 Inspections 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN 0.0
15 Inspections 1.0 1.0 0.0 0.0 0.0 4.0 2.0 1.0 0.0 4.0 1.0 0.0 0.0 2.0 1.0 0.0 17.0
Total 1028.0 1037.0 641.0 356.0 183.0 102.0 46.0 11.0 5.0 14.0 7.0 0.0 0.0 2.0 1.0 0.0 3433.0

Findings: From above table we can see that many establishments keeps on failing to stay compliant. We can say that it takes around 7 re-inspections until establishment has become compliant. Very few are staying non-compliant after 6-7 inspections.

Exercise #3 - Data Visualization

For the data visualization tasks below, you may choose to use any Python visualization package you wish. Make sure all graphs are labeled appropriately. Limit your analysis to restaurants using the dummy variable indicator you created in 1.1.

Step 3.1 - Create a bar graph showing the results of 2.1.

In [20]:
with plt.style.context(('seaborn', 'ggplot')):
    violation_type_to_cnt.plot(kind='bar', width=0.8)
    plt.xlabel('Violation Type')
    plt.ylabel('Number Of Violations')
    plt.title('Number of Violations per Violation Type')

Step 3.2 - Create a line graph that shows the percent of restaurant inspections that have at least one violation by month and year. Are inspections getting harder or easier over time? Is there a particular month where more restaurants pass? Write 2-4 sentences with your observations.

In [21]:
percent_violation_per_month = food_inspection[food_inspection.Restaurant == True].groupby(by='Inspection_month').mean()[['Out_of_compliance']] * 100
percent_violation_per_month
Out[21]:
Out_of_compliance
Inspection_month
1 42.926829
2 41.534392
3 40.921053
4 40.620155
5 40.632054
6 40.892193
7 49.541284
8 53.907816
9 52.734375
10 44.662921
11 43.188854
12 41.935484
In [22]:
with plt.style.context(('seaborn', 'ggplot')):
    percent_violation_per_month.plot()
    plt.ylabel('% of Violations')
    plt.xticks(range(12), range(1,13))
    plt.title('% of Violations Per Month')

Findings: We can see from above graph that restaurants have less violations during year end and start. It has noticed violations keeps on going down till July and then it start increasing again peeking during September and then drops again

In [23]:
percent_violation_per_year = food_inspection[food_inspection.Restaurant == True].groupby(by='Inspection_year').mean()[['Out_of_compliance']] * 100
percent_violation_per_year
Out[23]:
Out_of_compliance
Inspection_year
2017 50.390625
2018 43.958230
2019 40.541781
In [24]:
with plt.style.context(('seaborn', 'ggplot')):
    percent_violation_per_year.plot()
    plt.ylabel('% of Violations')
    mini, maxi = int(percent_violation_per_year.index[0]), int(percent_violation_per_year.index[-1])+1
    plt.title('% of Violations Per Year')
    #plt.xticks(range(3), [2017,2018,2019])

Findings: From above graph, we can see that restaurants violations are going down year by year and in 2019 are quite below 40%.

In [26]:
out_of_compliance_count_per_rest = food_inspection[food_inspection.Restaurant == True].groupby(by='Establishment ID').sum()[['Out_of_compliance']].rename(columns={'Out_of_compliance': 'Out_of_compliance_Cnt'})
food_inspection_locations = food_inspection.drop_duplicates(subset=['Establishment ID']).set_index('Establishment ID')[['Location']]
non_compliance_count_per_rest_with_loc = out_of_compliance_count_per_rest.join(food_inspection_locations, how='left')[['Out_of_compliance_Cnt', 'Location']]
non_compliance_count_per_rest_with_loc = non_compliance_count_per_rest_with_loc.dropna()
non_compliance_count_per_rest_with_loc.head()
Out[26]:
Out_of_compliance_Cnt Location
Establishment ID
63 1 (39.0686, -77.1309)
71 3 (39.1615, -77.1597)
83 2 (38.9792, -77.0984)
91 0 (39.0578, -77.1127)
98 0 (39.0399, -77.0493)
In [27]:
print(Counter(non_compliance_count_per_rest_with_loc.Out_of_compliance_Cnt))
Counter({0: 583, 1: 578, 2: 412, 3: 244, 4: 139, 5: 85, 6: 33, 7: 10, 8: 4, 9: 1})
In [28]:
locations = np.array([loc.replace('(','').replace(')','').split(',') for loc in non_compliance_count_per_rest_with_loc.Location]).astype(np.float32)
locations[:10]
Out[28]:
array([[ 39.0686, -77.1309],
       [ 39.1615, -77.1597],
       [ 38.9792, -77.0984],
       [ 39.0578, -77.1127],
       [ 39.0399, -77.0493],
       [ 38.9906, -77.0246],
       [ 39.0961, -77.0489],
       [ 39.009 , -77.0409],
       [ 39.1499, -77.1981],
       [ 39.1846, -77.1888]], dtype=float32)
In [29]:
lng, lat = locations[:,0], locations[:,1]
non_compliance_count_per_rest_with_loc['lat'] = lat
non_compliance_count_per_rest_with_loc['lng'] = lng
non_compliance_count_per_rest_with_loc.head()
Out[29]:
Out_of_compliance_Cnt Location lat lng
Establishment ID
63 1 (39.0686, -77.1309) -77.130898 39.068600
71 3 (39.1615, -77.1597) -77.159698 39.161499
83 2 (38.9792, -77.0984) -77.098396 38.979198
91 0 (39.0578, -77.1127) -77.112701 39.057800
98 0 (39.0399, -77.0493) -77.049301 39.039902
In [30]:
mid_lat = (non_compliance_count_per_rest_with_loc.lat.min() + non_compliance_count_per_rest_with_loc.lat.max())/2
mid_lng = (non_compliance_count_per_rest_with_loc.lng.min() + non_compliance_count_per_rest_with_loc.lng.max())/2

folium_map = folium.Map(location=(mid_lat, mid_lng),
                        zoom_start=12,
                        tiles="CartoDB dark_matter")

for e_id in non_compliance_count_per_rest_with_loc.index:
    folium.CircleMarker(location=(non_compliance_count_per_rest_with_loc.loc[e_id]['lat'], non_compliance_count_per_rest_with_loc.loc[e_id]['lng']),
                            radius=float(non_compliance_count_per_rest_with_loc.loc[e_id]['Out_of_compliance_Cnt']+1),
                            color='red' if non_compliance_count_per_rest_with_loc.loc[e_id]['Out_of_compliance_Cnt'] > 0 else 'white',
                            popup=str(e_id) + ':' + str(non_compliance_count_per_rest_with_loc.loc[e_id]['Out_of_compliance_Cnt']),
                            tooltip=str(e_id) + ', Violations : ' + str(non_compliance_count_per_rest_with_loc.loc[e_id]['Out_of_compliance_Cnt']),
                            fill=True).add_to(folium_map)

folium_map.save('loc_wise_map.html')
In [31]:
mid_lat = (non_compliance_count_per_rest_with_loc.lat.min() + non_compliance_count_per_rest_with_loc.lat.max())/2
mid_lng = (non_compliance_count_per_rest_with_loc.lng.min() + non_compliance_count_per_rest_with_loc.lng.max())/2


fig = go.Figure()
    
for e_id in non_compliance_count_per_rest_with_loc.index:
    cnt = non_compliance_count_per_rest_with_loc.loc[e_id]['Out_of_compliance_Cnt']
    fig.add_trace(go.Scattermapbox(
        lat= [non_compliance_count_per_rest_with_loc.loc[e_id].lat],
        lon=[non_compliance_count_per_rest_with_loc.loc[e_id].lng],
        mode='markers',
        marker=go.scattermapbox.Marker(
            size= 8 if cnt == 0 else 12,
            color = 'white' if cnt==0 else 'red',
            opacity=0.7
        ),
        text=[str(e_id) + ', Violations : ' + str(non_compliance_count_per_rest_with_loc.loc[e_id]['Out_of_compliance_Cnt'])],
        
    ))

fig.update_layout(
    title='Violation Count distribution',
    autosize=True,
    hovermode='closest',
    showlegend=False,
    mapbox=go.layout.Mapbox(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=go.layout.mapbox.Center(
            lat=mid_lat,
            lon=mid_lng
        ),
        pitch=0,
        zoom=9,
        style = 'dark'
    ),
)

fig.show()
In [32]:
food_inspection[food_inspection.Restaurant == True].groupby('City').sum()[compliance_type_columns].sort_values(compliance_type_columns, ascending=False)
Out[32]:
Food from Approved Source (C) Food Protected from Contamination (C) Ill Workers Restricted (C) Proper Hand Washing (C) Cooling Time and Temperature (C) Cold Holding Temperature (C) Hot Holding Temperature (C) Cooking Time and Temperature (C) Reheating Time and Temperature (C) Hot and Cold Running Water Provided (C) Proper Sewage Disposal (C) Toxic Substances & Pesticides Rodent and Insects Nutritional Labeling Trans Fat Ban No-Smoking Sign Posted
City
GAITHERSBURG 22 100 0 183 87 360 117 10 15 17 9 0 340 0.0 32 22
SILVER SPRING 5 27 0 19 9 282 95 3 11 9 3 1 253 0.0 18 21
BETHESDA 4 142 0 32 8 402 119 3 12 12 2 2 158 0.0 36 22
POTOMAC 2 21 0 22 12 106 20 0 0 2 2 0 24 0.0 0 0
BURTONSVILLE 2 0 0 0 0 6 1 0 0 1 0 0 28 0.0 0 0
GERMANTOWN 1 15 0 24 13 114 32 0 1 3 0 0 81 0.0 2 1
CHEVY CHASE 1 7 0 1 1 40 14 0 0 1 0 0 16 0.0 6 1
DICKERSON 1 0 0 0 1 0 0 0 0 0 0 0 0 0.0 1 1
ROCKVILLE 0 50 1 40 39 338 78 11 16 7 3 0 241 0.0 93 64
OLNEY 0 20 0 7 2 65 27 0 2 1 0 0 21 0.0 2 0
DERWOOD 0 7 0 2 0 13 6 1 0 1 0 0 9 0.0 4 3
TAKOMA PARK 0 4 1 0 0 23 3 0 0 0 0 0 23 0.0 1 2
CABIN JOHN 0 2 0 0 1 9 3 0 0 0 0 0 1 0.0 2 2
WHEATON 0 2 0 0 0 37 17 0 0 2 1 0 23 0.0 0 0
ASHTON 0 2 0 0 0 8 2 0 0 0 0 0 1 0.0 0 0
BELTSVILLE 0 2 0 0 0 0 1 0 0 0 0 0 3 0.0 0 0
CLARKSBURG 0 1 0 3 1 10 7 0 1 0 0 0 8 0.0 0 0
KENSINGTON 0 1 0 2 0 13 1 1 2 0 0 0 4 0.0 1 0
DAMASCUS 0 1 0 2 0 10 6 0 2 0 0 0 11 0.0 0 0
ASPEN HILL 0 1 0 1 0 2 0 0 0 0 0 0 2 0.0 0 0
GLEN ECHO 0 1 0 0 0 2 2 0 0 0 0 1 0 0.0 0 0
MONTGOMERY VILLAGE 0 0 0 1 0 4 3 0 0 0 0 0 4 0.0 0 0
BROOKEVILLE 0 0 0 1 0 4 0 0 0 0 0 0 2 0.0 0 0
MONTG. VILLAGE 0 0 0 1 0 2 1 0 0 0 0 0 1 0.0 0 0
N. BETHESDA 0 0 0 1 0 2 1 0 0 0 0 0 1 0.0 0 0
POOLESVILLE 0 0 0 0 0 8 1 0 2 0 0 0 5 0.0 1 0
LAYTONSVILLE 0 0 0 0 0 3 2 0 0 0 0 0 4 0.0 0 0
NORTH BETHESDA 0 0 0 0 0 1 0 0 0 0 0 0 0 0.0 1 1
N. POTOMAC 0 0 0 0 0 1 0 0 0 0 0 0 0 0.0 0 0
BEALLSVILLE 0 0 0 0 0 0 0 0 0 0 0 0 1 0.0 0 0
COMUS 0 0 0 0 0 0 0 0 0 0 0 0 1 0.0 0 0
GARRETT PARK 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
LANGLEY PARK 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
MT. AIRY 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0
SANDY SPRING 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 0 0

Findings: After looking at interactive visualization, we can come to conclusion that majority of violations count per establishment stands around 5-7. We can see that there are few areas where there are clusters of establishments which are defaulters with around 5-7 violations like Berwyn Heights,College Park, Kolbes Corner, RITCHIE, Hilcrest Heights, Upper Marlboro, Clinton, and few more.

We also noticed that cities like RIVERDALE, LAUREL,BOWIE, UPPER MARLBORO, COLLEGE PARK,CLITON, HYATTSVILLE, LANDOVER, BLADENSBURG, FORT WASHINGTON, WEST HYATTSVILLE, NEW CARROLLTON are few cities where all violations are quite common.

Food_from_approved_source, Ill_workers_restricted, Cooling_time_and_temperature, Cooking_time_and_temperature, Reheating_time_and_temperature, Hot_and_cold_running_water_provided, Proper_sewage_disposal, No_bare_hand_contact, Adequate_hand_washing_facilities are very least occurring violations.

Food_protected_from_contamination, Proper_hand_washing, Cold_holding_temperature, Hot_holding_temperature, Rodent_and_insects, Food_contact_surfaces_and_equipment are quite commonly occurring violations.

Exercise #4 - All the photos! Sort of...

This assignment builds off of previous Exercise from weeks ago, scraping data from https://pokemondb.net

Step 4.1: Start with your deduplicated pokedex data. Just like in Step 2.4 from HW #4, create a new column in the DataFrame called sample2 that tags every 12th pokemon. (This will reduce our sample size.)

In [38]:
soup = BeautifulSoup(requests.get('https://pokemondb.net/pokedex/all').content, 'html.parser')
all_rows = soup.find_all('tr')

def create_data_frame(all_rows):
    header_names = []
    rows_data = []
    for row in all_rows:
        single_row_data = []
        headers = row.find_all('th')
        for header in headers:
            header_names.append(header.text)
            if header.text == 'Name':
                header_names.append('URL')
        row_content = row.find_all('td')
        for val in row_content:
            single_row_data.append(val.text)
            if val.get('class')[0] == 'cell-name':
                hyperlink = val.find('a')
                single_row_data.append(hyperlink['href'])
        if single_row_data:
            rows_data.append(single_row_data)

    return pd.DataFrame(rows_data, columns=header_names)

final_df = create_data_frame(all_rows)

final_df['#'] = final_df['#'].astype('int')
final_df['Total'] = final_df['Total'].astype('int')
final_df['HP'] = final_df['HP'].astype('int')
final_df['Attack'] = final_df['Attack'].astype('int')
final_df['Defense'] = final_df['Defense'].astype('int')
final_df['Sp. Atk'] = final_df['Sp. Atk'].astype('int')
final_df['Sp. Def'] = final_df['Sp. Def'].astype('int')
final_df['Speed'] = final_df['Speed'].astype('int')
final_df['Type'] = final_df['Type'].str.strip()

types = []
for val in final_df.Type:
    types.extend(val.strip().split(' '))
unique_types = list(set(types))

types_data = []
for row in final_df.Type:
    row_type_dummy = [1 if typ in row else 0for typ in unique_types]
    types_data.append(row_type_dummy)

types_data = np.array(types_data)

for i, typ in enumerate(unique_types):
    final_df[typ] = types_data[:, i]

final_df = final_df.drop_duplicates(subset=['URL'])
final_df.head(10)
Out[38]:
# Name URL Type Total HP Attack Defense Sp. Atk Sp. Def Speed Flying Fairy Electric Dragon Rock Ghost Grass Ice Normal Fire Fighting Dark Water Steel Psychic Bug Poison Ground
0 1 Bulbasaur /pokedex/bulbasaur Grass Poison 318 45 49 49 65 65 45 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
1 2 Ivysaur /pokedex/ivysaur Grass Poison 405 60 62 63 80 80 60 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
2 3 Venusaur /pokedex/venusaur Grass Poison 525 80 82 83 100 100 80 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
4 4 Charmander /pokedex/charmander Fire 309 39 52 43 60 50 65 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
5 5 Charmeleon /pokedex/charmeleon Fire 405 58 64 58 80 65 80 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6 6 Charizard /pokedex/charizard Fire Flying 534 78 84 78 109 85 100 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
9 7 Squirtle /pokedex/squirtle Water 314 44 48 65 50 64 43 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
10 8 Wartortle /pokedex/wartortle Water 405 59 63 80 65 80 58 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
11 9 Blastoise /pokedex/blastoise Water 530 79 83 100 85 105 78 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
13 10 Caterpie /pokedex/caterpie Bug 195 45 30 35 20 20 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
In [39]:
final_df['Sample'] = ['sample' if i%4 == 0 else None for i in final_df['#']]
final_df['Sample2'] = ['sample' if i%12 == 0 else None for i in final_df['#']]
final_df.head(15)
Out[39]:
# Name URL Type Total HP Attack Defense Sp. Atk Sp. Def Speed Flying Fairy Electric Dragon Rock Ghost Grass Ice Normal Fire Fighting Dark Water Steel Psychic Bug Poison Ground Sample Sample2
0 1 Bulbasaur /pokedex/bulbasaur Grass Poison 318 45 49 49 65 65 45 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 None None
1 2 Ivysaur /pokedex/ivysaur Grass Poison 405 60 62 63 80 80 60 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 None None
2 3 Venusaur /pokedex/venusaur Grass Poison 525 80 82 83 100 100 80 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 None None
4 4 Charmander /pokedex/charmander Fire 309 39 52 43 60 50 65 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 sample None
5 5 Charmeleon /pokedex/charmeleon Fire 405 58 64 58 80 65 80 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 None None
6 6 Charizard /pokedex/charizard Fire Flying 534 78 84 78 109 85 100 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 None None
9 7 Squirtle /pokedex/squirtle Water 314 44 48 65 50 64 43 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 None None
10 8 Wartortle /pokedex/wartortle Water 405 59 63 80 65 80 58 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 sample None
11 9 Blastoise /pokedex/blastoise Water 530 79 83 100 85 105 78 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 None None
13 10 Caterpie /pokedex/caterpie Bug 195 45 30 35 20 20 45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 None None
14 11 Metapod /pokedex/metapod Bug 205 50 20 55 25 25 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 None None
15 12 Butterfree /pokedex/butterfree Bug Flying 395 60 45 50 90 80 70 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 sample sample
16 13 Weedle /pokedex/weedle Bug Poison 195 40 35 30 20 20 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 None None
17 14 Kakuna /pokedex/kakuna Bug Poison 205 45 25 50 25 25 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 None None
18 15 Beedrill /pokedex/beedrill Bug Poison 395 65 90 40 45 80 75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 None None

Step 4.2: Scrape the photo URLs for each pokemon in sample2 like in Step 3.1 from HW #4.

Step 4.3: Display the photos for all of the sample2 pokemon in your Jupyter notebook. Example code for how to display multiple images in a loop is below. Display(Image(url=_, unconfined=True))

In [40]:
for pokemon_name in final_df[final_df.Sample2 == 'sample'].Name.values:
    soup = BeautifulSoup(requests.get('https://pokemondb.net/pokedex/%s'%pokemon_name).content)
    try:
        image_url = soup.find_all('img')[0]['src']
    except:
        image_url = None
    #image_content = requests.get(image_url).content
    if image_url:
        display(Image(url=image_url, unconfined=True))

Exercise #5 - Rescrape and Clean the Location Table

Step5.1: Using BeautifulSoup, scrape the Location table for Bulbasaur properly and add it to a DataFrame. When multiple columns are combined, separate the columns and duplicate the location information. For example, the column "Red Blue" that contains "Pallet Town" will become two columns-- "Red" with location "Pallet Town" and "Blue" with location "Pallet Town." Transpose the DataFrame so the video game (e.g. "Red") is the column name.

In [41]:
pokemon_name = 'bulbasaur'
url = 'https://pokemondb.net/pokedex/%s'%pokemon_name
tables = pd.read_html(requests.get(url, headers={'User-agent': 'Mozilla/5.0'}).text)
tables[-2]
Out[41]:
0 1
0 RedBlue Pallet Town
1 Yellow Cerulean City
2 GoldSilverCrystal Trade/migrate from another game
3 RubySapphire Trade/migrate from another game
4 FireRedLeafGreen Pallet Town
5 Emerald Trade/migrate from another game
6 DiamondPearlPlatinum Trade/migrate from another game
7 HeartGoldSoulSilver Pallet Town
8 BlackWhiteBlack 2White 2 Trade/migrate from another game
9 XY Lumiose City
10 Omega RubyAlpha Sapphire Trade/migrate from another game
11 SunMoon Trade/migrate from another game
12 Ultra SunUltra Moon Route 2
13 Let's Go PikachuLet's Go Eevee Cerulean City, Viridian Forest
In [42]:
loc_cols = ['Red','Blue', 'Yellow', 'Gold', 'Silver', 'Crystal', 'Ruby', 'Sapphire', 'Emerald', 'FireRed', 'LeafGreen',\
       'Diamond', 'Pearl', 'Platinum', 'HeartGold', 'SoulSilver','Black', 'White', 'Black 2', 'White 2', 'X','Y',\
       'Omega Ruby', 'Alpha Sapphire', "Let's Go Pikachu", "Let's Go Eevee"]

game_dataframe = pd.DataFrame([tables[-2][1].values], columns=tables[-2][0].values)
game_dataframe
Out[42]:
RedBlue Yellow GoldSilverCrystal RubySapphire FireRedLeafGreen Emerald DiamondPearlPlatinum HeartGoldSoulSilver BlackWhiteBlack 2White 2 XY Omega RubyAlpha Sapphire SunMoon Ultra SunUltra Moon Let's Go PikachuLet's Go Eevee
0 Pallet Town Cerulean City Trade/migrate from another game Trade/migrate from another game Pallet Town Trade/migrate from another game Trade/migrate from another game Pallet Town Trade/migrate from another game Lumiose City Trade/migrate from another game Trade/migrate from another game Route 2 Cerulean City, Viridian Forest
In [43]:
row_data = []
for col in loc_cols:
    col_data_found = False
    for g_col in game_dataframe.columns:
        if col in g_col:
            row_data.append(game_dataframe[g_col].values[0])
            col_data_found = True
            break
    if not col_data_found:
        row_data.append(None)

game_dataframe_modified = pd.DataFrame([row_data], columns=loc_cols)
game_dataframe_modified
Out[43]:
Red Blue Yellow Gold Silver Crystal Ruby Sapphire Emerald FireRed LeafGreen Diamond Pearl Platinum HeartGold SoulSilver Black White Black 2 White 2 X Y Omega Ruby Alpha Sapphire Let's Go Pikachu Let's Go Eevee
0 Pallet Town Pallet Town Cerulean City Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Pallet Town Pallet Town Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Pallet Town Pallet Town Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Lumiose City Cerulean City Trade/migrate from another game Trade/migrate from another game Cerulean City, Viridian Forest Cerulean City, Viridian Forest

Step 5.2: Generalize Step 5.1 for all pokemon in the original sample (the sample from Step 5.1 from HW #4). Append all the data together. If information is not available for a pokemon in a particular game, the column should contain NaN. Display the entire table.

In [44]:
pokemons, game_dataframes = [], []
for pokemon_name in final_df[final_df.Sample == 'sample'].Name:
    url = 'https://pokemondb.net/pokedex/%s'%pokemon_name
    try:
        tables = pd.read_html(requests.get(url, headers={'User-agent': 'Mozilla/5.0'}, timeout=3).text)
        game_dataframe = pd.DataFrame([tables[-2][1].values], columns=tables[-2][0].values)
        game_dataframe.columns = [col.strip() for col in game_dataframe.columns]
    except:
        continue
    if game_dataframe.shape[0]>0:
        game_dataframes.append(game_dataframe)
        pokemons.append(pokemon_name)

print('Number of game dataframes %d'%len(game_dataframes))
Number of game dataframes 190
In [45]:
total_data = []
for pokemon, df in zip(pokemons, game_dataframes):
    row_data = [pokemon]
    for col in loc_cols:
        col_data_found = False
        for g_col in df.columns:
            if col in g_col:
                row_data.append(df[g_col].values[0])
                col_data_found = True
                break
        if not col_data_found:
            row_data.append(None)
    total_data.append(row_data)

locations_data = pd.DataFrame(total_data, columns = ['Name'] + loc_cols).set_index('Name')
locations_data
Out[45]:
Red Blue Yellow Gold Silver Crystal Ruby Sapphire Emerald FireRed LeafGreen Diamond Pearl Platinum HeartGold SoulSilver Black White Black 2 White 2 X Y Omega Ruby Alpha Sapphire Let's Go Pikachu Let's Go Eevee
Name
Charmander Pallet Town Pallet Town Route 24 Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Pallet Town Pallet Town Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Pallet Town Pallet Town Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Lumiose City Route 24 Trade/migrate from another game Trade/migrate from another game Route 3, 4, 24, Rock Tunnel Route 3, 4, 24, Rock Tunnel
Wartortle Evolve Squirtle Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle
Butterfree Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Route 2, 26, 27, 34, 35, 36, 37, 38, 39, Azale... National Park Route 2, 24, 25, Ilex Forest, National Park Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Route 2, 47, Ilex Forest, National Park, Virid... National Park Trade/migrate from another game Route 12 Trade/migrate from another game Trade/migrate from another game Evolve Caterpie/Metapod Evolve Caterpie/Metapod Trade/migrate from another game Trade/migrate from another game Viridian Forest Evolve Caterpie/Metapod
Pidgey Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 21,... Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 21,... Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 21, 24, 25... Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Route 1, 2, 5, 25, 29, 30, 31, 32, 34, 35, 36,... Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 24,... Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 24,... Route 229 Route 229 Route 229 Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Trade/migrate from another game White Forest Trade/migrate from another game Trade/migrate from another game Route 2, 3 Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 21, 24, 25... Breed Pidgeotto/Pidgeot Breed Pidgeotto/Pidgeot Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16... Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16...
Raticate Route 16, 17, 18, 21 Route 16, 17, 18, 21 Route 9, 10, 11, 16, 18, 21, Pokémon Mansion Route 7, 9, 10, 26, 27, 38, 39, Burned Tower, ... Route 7, 9, 10, 26, 27, 38, 39, Burned Tower, ... Route 1, 3, 4, 6, 7, 9, 10, 11, 21, 26, 27, 38... Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Route 16, 17, 18, Pokémon Mansion Route 16, 17, 18, Pokémon Mansion Route 225, 226 Route 225, 226 Route 225, 226 Route 7, 9, 10, 26, 27, 38, 39, 47, Burned Tow... Route 7, 9, 10, 26, 27, 38, 39, 47, Burned Tow... Dreamyard Dreamyard Dreamyard, Relic Passage, Strange House Dreamyard, Relic Passage, Strange House Trade/migrate from another game Route 9, 10, 11, 16, 18, 21, Pokémon Mansion Route 118 Route 118 Route 7, 8, 9, 10, 11, 16, 17, 18, 21, Pokémon... Route 7, 8, 9, 10, 11, 16, 17, 18, 21, Pokémon...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Kommo-o None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Lunala None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Xurkitree None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Necrozma None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Naganadel None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available

190 rows × 26 columns

Exercise #6 - Location Analysis Done Right

Step 6.1: Join the pokedex data to the location DataFrame created in Step 5.2 above. (Exclude pokemon that are not in the sample.) For the locations in pokemon game X, calculate the average total points for each location. Which location has the highest average total point score?

In [46]:
final_df_sample = final_df[final_df['Sample'] == 'sample'].set_index('Name')
print('Sample Dataset Size : ',final_df_sample.shape)
final_df_with_loc_data = final_df_sample.join(locations_data, how='left')
final_df_with_loc_data
Sample Dataset Size :  (202, 30)
Out[46]:
# URL Type Total HP Attack Defense Sp. Atk Sp. Def Speed Flying Fairy Electric Dragon Rock Ghost Grass Ice Normal Fire Fighting Dark Water Steel Psychic Bug Poison Ground Sample Sample2 Red Blue Yellow Gold Silver Crystal Ruby Sapphire Emerald FireRed LeafGreen Diamond Pearl Platinum HeartGold SoulSilver Black White Black 2 White 2 X Y Omega Ruby Alpha Sapphire Let's Go Pikachu Let's Go Eevee
Name
Charmander 4 /pokedex/charmander Fire 309 39 52 43 60 50 65 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 sample None Pallet Town Pallet Town Route 24 Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Pallet Town Pallet Town Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Pallet Town Pallet Town Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Lumiose City Route 24 Trade/migrate from another game Trade/migrate from another game Route 3, 4, 24, Rock Tunnel Route 3, 4, 24, Rock Tunnel
Wartortle 8 /pokedex/wartortle Water 405 59 63 80 65 80 58 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 sample None Evolve Squirtle Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle Trade/migrate from another game Trade/migrate from another game Evolve Squirtle Evolve Squirtle
Butterfree 12 /pokedex/butterfree Bug Flying 395 60 45 50 90 80 70 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 sample sample Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Route 2, 26, 27, 34, 35, 36, 37, 38, 39, Azale... National Park Route 2, 24, 25, Ilex Forest, National Park Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Evolve Caterpie/Metapod Route 2, 47, Ilex Forest, National Park, Virid... National Park Trade/migrate from another game Route 12 Trade/migrate from another game Trade/migrate from another game Evolve Caterpie/Metapod Evolve Caterpie/Metapod Trade/migrate from another game Trade/migrate from another game Viridian Forest Evolve Caterpie/Metapod
Pidgey 16 /pokedex/pidgey Normal Flying 251 40 45 40 35 35 56 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 sample None Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 21,... Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 21,... Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 21, 24, 25... Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Route 1, 2, 5, 25, 29, 30, 31, 32, 34, 35, 36,... Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 24,... Route 1, 2, 3, 5, 6, 7, 8, 12, 13, 14, 15, 24,... Route 229 Route 229 Route 229 Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Route 1, 2, 5, 6, 25, 29, 30, 31, 35, 36, 37, ... Trade/migrate from another game White Forest Trade/migrate from another game Trade/migrate from another game Route 2, 3 Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 21, 24, 25... Breed Pidgeotto/Pidgeot Breed Pidgeotto/Pidgeot Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16... Route 1, 2, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16...
Raticate 20 /pokedex/raticate Normal 413 55 81 60 50 70 97 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 sample None Route 16, 17, 18, 21 Route 16, 17, 18, 21 Route 9, 10, 11, 16, 18, 21, Pokémon Mansion Route 7, 9, 10, 26, 27, 38, 39, Burned Tower, ... Route 7, 9, 10, 26, 27, 38, 39, Burned Tower, ... Route 1, 3, 4, 6, 7, 9, 10, 11, 21, 26, 27, 38... Trade/migrate from another game Trade/migrate from another game Trade/migrate from another game Route 16, 17, 18, Pokémon Mansion Route 16, 17, 18, Pokémon Mansion Route 225, 226 Route 225, 226 Route 225, 226 Route 7, 9, 10, 26, 27, 38, 39, 47, Burned Tow... Route 7, 9, 10, 26, 27, 38, 39, 47, Burned Tow... Dreamyard Dreamyard Dreamyard, Relic Passage, Strange House Dreamyard, Relic Passage, Strange House Trade/migrate from another game Route 9, 10, 11, 16, 18, 21, Pokémon Mansion Route 118 Route 118 Route 7, 8, 9, 10, 11, 16, 17, 18, 21, Pokémon... Route 7, 8, 9, 10, 11, 16, 17, 18, 21, Pokémon...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Lunala 792 /pokedex/lunala Psychic Ghost 680 137 113 89 137 107 97 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 sample sample None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Xurkitree 796 /pokedex/xurkitree Electric 570 83 89 71 173 71 83 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 sample None None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Necrozma 800 /pokedex/necrozma Psychic 600 97 107 101 127 89 79 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 sample None None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Naganadel 804 /pokedex/naganadel Poison Dragon 540 73 73 73 127 73 121 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 sample sample None None None None None None None None None None None None None None None None None None None None None None None None Location data not yet available Location data not yet available
Meltan 808 /pokedex/meltan Steel 300 46 65 65 55 35 34 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 sample None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

202 rows × 56 columns

In [47]:
avg_total_point_per_location = final_df_with_loc_data.groupby(by='X').mean()[['Total']]
avg_total_point_per_location.head()
Out[47]:
Total
X
Ambrette Town 355.5
Breed Electrode 330.0
Breed Gurdurr/Conkeldurr 305.0
Breed Hariyama 237.0
Breed Haunter/Gengar 310.0
In [48]:
avg_total_point_per_location.sort_values(by='Total', ascending=False).head()
Out[48]:
Total
X
Team Flare HQ 680.0
Sea Spirit's Den, Roaming Kalos 580.0
Evolve Axew/Fraxure 540.0
Evolve Rhyhorn/Rhydon 535.0
Evolve Vanillite/Vanillish 535.0

Location "Team Flare HQ" has highest total as we can see above from sorted values.

In [ ]: